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ABSTRACT 


These  lectures  survey  attempts  to  apply  computers  directly  to  high  level 
languages  using  microprogrammed  Interpreters.  The  motivation  for  such  work 
Is  to  achieve  language  Implementations  that  are  more  effective  In  some  measure 
of  translation,  execution  or  response  to  the  user  than  would  otherwise  be 
obtained.  The  Implied  comparison  Is  with  the  established  technique  of  compiling 
Into  a fixed  general-purpose  machine  code  prior  to  execution.  It  Is  argued 
that  while  substantial  benefits  can  be  expected  from  microprogramming  It  does 
not  represent  the  best  approach  to  design  when  the  contributing  factors  are 
analysed  In  a general  system  context,  that  Is  to  say  when  wide  performance 
range,  multiple  source  language,  and  stringent  security  requirements  have  to 
be  satisfied.  An  alternative  Is  suggested,  using  a coiiA>lnatlon  of  Interpre- 
tation and  a primitive  Instruction  set  and  providing  security  at  the  microprogram 
level . 
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These  lectures  survey  attempts  to  apply  computers  directly  to 
high  level  languages  using  microprogrammed  Interpreters.  The 
motivation  for  such  work  is  to  achieve  language  implementations 
that  are  more  effective  in  some  measure  of  translation,  execution 
or  response  to  the  user  than  would  otherwise  be  obtained.  The 
implied  comparison  is  with  the  established  technique  of  compiling 
into  a fixed  general-purpose  machine  code  prior  to  execution.  It 
is  argued  that  while  substantial  benefits  can  be  expected  from 
microprogramming  it  does  not  represent  the  best  approach  to  design 
when  the  contributing  factors  are  analysed  in  a general  system 
context,  that  is  to  say  when  wide  performance  range,  multiple 
source  language,  and  stringent  security  requirements  have  to  be 
satisfied.  An  alternative  is  suggested,  using  a combination  of 
interpretation  and  a primitive  instruction  set  and  providing 
security  at  the  microprogram  level. 

The  early  lectures  review  the  history  and  terminology  of  micro- 
programmable  machines.  Knowledge  of  conventional  practice  is 
assumed.  Readers  already  experienced  in  microprogramming  should 
skip  rapidly  to  Lecture  3. 


1 MICROINSTRUCTION  DESIGN 

If  we  abandon  the  conventional  machine  code  (at  least  temporar- 
ily) as  a means  of  defining  the  computer's  function  set  it  is 
necessary  to  fall  back  on  Che  next  level  of  description,  l.e.  the 
microcode.  A very  extensive  literature  has  grown  up  around  that 
subject  in  recent  years,  but  I think  it  is  true  to  say  that  no 
coimonly  accepted  theory  or  principles  have  emerged:  that  is  the 
consequence  of  rapid  changes  in  the  process  of  manufacturing 
logical  devices  which  force  a continual  ravision  of  the  economics 
of  design.  In  the  introductory  lectures  we  shall  study  the 
evolution  of  microprograMad  machines,  but  one  can  do  little  more 
than  present  a collection  of  techniques.  For  detailed  study  of 
application  to  machine  language  interpretation  the  student  is 
referred  to  Husson  (1970),  where  an  extensive  bibliography  to 


1968  will  be  found,  and  to  Boulaye  (1971),  for  a shorter  survey  of 
techniques.  In  the  following  notes  I can  do  no  more  than  provide 
an  outline  of  design  principles  and  introduce  terminology. 

The  branch  of  technology  that  enables  a raw  microprocessor  to 
interpret  a given  order  code  is  termed  'microsystem  design*.  If 
one  machine  is  to  interpret  one  order  code  it  is  a very  localised 
affair.  If  several  machines  must  imitate  two  or  three  order  codes 
Che  need  for  standard  procedures  and  documentation  arises:  in  the 
major  application  areas  this  is  treated  very  much  as  an  extension 
of  the  logic  design.  Tucker  (1967)  and  Husson  have  written  infor- 
matively on  that  aspect  of  microsystems.  However,  high  level 
languages  are  not  nearly  as  well  defined  as  machine  codes,  they 
are  generally  more  complex,  subject  to  greater  variation,  and  out- 
side Che  control  of  any  one  laboratory.  A survey  by  Rosin  high- 
lights some  of  Che  difficulties  Involved,  Rosin  (1969).  We  shall 
return  to  that  subject  in  Che  last  lecture,  showing  how  it  affects 
machine  design.  For  the  time  being,  let  us  recall  how  a micro- 
programmed machine  handles  the  interpretation  of  a single  'target 
instruction  set*  or  'machine  code*. 

The  first  application  of  microprogramming  as  a formal  technique 
is  generally  attributed  to  the  designers  of  EDSAC-2  at  Cambridge 
University,  Wilkes  (1958).  It  is  a systematic  way  of  controlling 
Che  flow  of  signals  through  the  data  paths  of  a processing  unit, 
each  path,  or  in  some  cases  each  function  of  the  processor,  being 
determined  by  a bit  in  a microinstruction.  If  we  regard  the  state 
of  the  processor  as  defined  by  the  assembly  of  registers  and  con- 
trol flip-flops,  then  a microinstruction  determines  a simple  tran- 
sition from  one  state  to  another.  The  attraction  of  Che  technique 
is  that  transformations  of  any  complexity  can  be  cotoposed  by  apply- 
ing a sequence  of  microinstructions:  the  limitations  Imposed  by  ad 
hoc  control  logic,  which  are  apparent  in  the  areas  of  machine 
definition  and  construction,  are  greatly  reduced.  At  a time  when 
relatively  complex  target  instructions  are  thought  to  be  the  key 
to  greater  machine  efficiency,  the  Introduction  of  microinstruc- 
tions obviously  has  great  attraction. 

The  source  of  microinstructions  is  a store,  which  will  be 
called  the  control  memory  in  the  present  context.  A single  bit 
in  the  microinstruction  can  control  the  transmission  of  an  entire 
field  from  one  register  along  several  parallel  paths  in  one 
processor  'cycle';  another  bit,  or  group  of  bits,  will  select  a 
destination  register  and  field.  It  is  fairly  easy  to  evolve  a 
requirement  for  fifty  or  more  bits  in  the  microinstruction  to 
control  the  possible  data  paths  in  the  processor. 

The  second  requirement  of  the  microinstruction  is  to  determine 
its  successor.  Application  of  a sequencing  rule  determines  the 
string  of  actions  carried  out  by  the  processor  which,  when  properly 
defined,  will  interpret  a target  instruction.  One  of  the  simplest 
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) ^ . ways  of  sequencing  Is  to  place  the  next  microinstruction  address 

' f in  the  one  currently  being  obeyed.  To  achieve  conditional  branch- 


ing effects  it  is  necessary  to  use  the  state  of  the  processing 
logic  in  the  calculation  of  at  least  part  of  the  next  address. 
The  elements  of  the  machine  can  be  visualised  as  in  Figure  1. 

The  machine  operates  in  three  steps i l.e.: 

1.  Access  control  memory  using  the  microinstruction  address. 

2.  Use  the  microinstruction  to  control  the  state  transition 
of  the  processor  logic 

3.  Use  microinstruction  digits  and  the  result  of  step  2 to 


determine  the  next  microinstruction  address. 


Figure  1:  Microprogram  Control 
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t The  development  of  mlcroprogrammable  machines  from  Che  above 

principle  of  design  leads  Co  great  elaboration  of  detail,  the 
main  considerations  being  (a)  optimising  Che  use  of  control 
memory,  (b)  achieving  balanced  timing  of  control  memory  and 
processor  logic,  and  (c)  organising  the  registers  and  data  paths 
of  the  processor  to  suit  Che  class  of  target  machines  of  interest. 
I shall  discuss  each  aspect  of  design,  giving  examples  from  some 
of  the  earlier  microprogrammed  machines. 

1.1  Minimising  the  Cost  of  Control  Memory 

' Exploitation  of  microprogramming  was  not  widespread  until 

suitable  techniques  for  loading  and  manufacturing  control  memory 
had  been  developed.  Such  techniques  are  discussed  by  Husson 
(Chapter  5) , where  it  can  be  seen  that  the  predominant  forms  of 
oonstrucCion  allowed  microinstructions  to  be  read  but  not  written 
under  program  control.  That  is  clearly  sufficient  for  a well 
defined  and  fixed  instruction  set.  The  later  development  of 
semiconductor  control  memories  with  write  capability  has  been 
the  main  stimulus  to  further  research  in  microprogram  application. 
With  all  memories,  however,  the  main  design  requirement  is  to 
deliver  the  information  required  at  the  right  time  and  in  as  few 
bits  as  possible. 

Considerations  of  space  lead  to  various  forms  of  microinstruc- 
tion coding.  The  form  in  which  a single  microinstruction  bit 
controls  a unique  processor  gate  (or  data  path)  is  termed  direct 
control . If  we  can  find  sets  of  mutually  exclusive  control 
signals,  such  that  not  more  than  one  is  activated  in  a given 
cycle,  it  is  poss^b|e  to  encode  them:  a field  of  K bits  will 
activate  one  of  2 control  lines,  or  none  at  all.  That  is 
obviously  the  case  when  one  of,  say,  8 registers  can  be  gated  to 
one  input  of  an  adder.  The  same  technique  is  used  in  machine 
code  design.  It  is  illustrated  below  by  the  structure  of  the 
IBM  360/30  microinstruction  and  by  most  of  the  ’first  generation' 
microcodes,  all  of  which  may  be  said  to  use  encoded  control,  the 
individual  fields  controlling  microorders. 

Three  other  common  forms  of  coding  deserve  mention.  In  bit- 
steering the  particular  control  lines  activated  by  a microorder 
(or  bit)  are  determined  by  another  field  of  the  microinstruction. 
The  second  field  directs  the  first  to  one  or  another  set  of  con- 
trol lines;  it  is  appropriate  when  the  processor  logic  can  be 
partitioned  into  sections  that  do  not  require  activation  on  every 
cycle  (and  can  to  some  degree  proceed  in  parallel) . It  has  been 
used  in  combination  with  other  techniques,  for  example  in  the  RCA 

I Spectra  70/45,  Honeywell  4200  and  IBM  360/25.  Carried  to  the 

I extreme,  the  microinstruction  ends  up  as  a function  group  and  a 

number  of  operand  fields,  which  would  be  difficult  to  distinguish 
at  first  sight  from  a conventional  machine  code. 
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The  second  technique  derives  from  the  observation  that  over 
many  sequences  of  microinstructions  the  values  of  certain  control 
lines  will  remain,  constant,  therefore  they  can  be  set  In  advance 
and  taken  as  an  Implicit  extension  of  the  microinstruction.  That 
technique  will  be  referred  to  as  preset  control.  It -applies,  for 
example.  If  particular  carry  or  shift  paths  are  fixed  In  advance, 
or  If  one  of  several  possible  register  sets  is  being  used. 

Finally,  It  Is  easy  to  see  that  all  2^^^  versions  of  a 100-bit 
direct  control  microinstruction  will  not  be  used,  and  Instead  of 
attempting  to  encode  Individual  fields  It  would  be  possible  to 
list  all  the  distinct  microinstructions  In  a particular  application 
and  select  those  required  by  Indexing  a store  containing  the  list. 
For  example.  In  a particular  application  there  may  be  less  than 
1024  distinct  microinstructions.  In  that  case  a 2000  word  micro- 
program can  be  compressed  Into  20  000  bits,  a saving  of  90X.  All 
that  is  required  is  that  the  fully  encoded  microinstruction  Index 
another  store  100  bits  wide  containing  the  1024  fully  decoded 
Instructions  (the  second  store  Is  called  the  nanostore).  The  net 
saving  In  storage  space  Is  thus  40Z. 


It  Is  more  like  that  some  of  the  fields  of  the  microinstruc- 
tion will  be  fully  used,  leaving  a residual  field  to  be  handled 
In  the  above  way.  The  Nanodata  QM-1  machine.  Rosin  et  al  (1972), 
provides  an  Illustration.  The  16  bit  microinstruction  Is  loaded 
Into  one  of  the  microregisters,  a six  bit  field  is  then  used  to 
select  a 342-bit  nano Instruct Ion.  The  latter  can  use  the  remain- 
ing ten  microinstruction  bits  as  operand  selectors,  so  It  Is 
appropriate  to  regard  them  as  a form  of  preset  nanocontrol 
(Figure  2) . At  this  point  the  designer  faces  the  same  set  of 
choices  at  nanomachine  level  as  we  have  already  discussed  in 
connection  with  mlcromachlnes . He  could  use  direct  control:  In 
fact,  QM-1  does  not,  but  obeys  a far  more  elaborate  sequence  of 
nanoorders.  The  reader  Is  referred  to  the  literature  for  details. 


Figure  2:  Nanoprogram  Control 
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1.2  TiminR  and  Control  Consideratioas 
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It  will  be  shown  later  that  interpreting  one  of  the  common 
target  instructions  takes  approximately  20  microorders  and  two 
main  memroy  cycles.  If  a premium  is  placed  on  memory  utilisation 
it  follows  that  the  effective  microorder  rate  must  be  ten  times 
that  of  main  memory:  to  achieve  that  the  early  machines  use  a 
horizontal  or  multi-order  microinstruction  that  activates  between 
five  and  ten  processor  p^ths  Jn  parallel.  The  microinstruction 
rate  is  synchronised  to  y or  — the  memory  cycle  time  so  that  a 
1.5  psec  core  memory  would  be^associated  with  a 750nsec  or  SOOnsec 
microinstruction  rate.  Horizontal  coding  achieves  speed  at  the 
expense  of  generality  and  ease  of  programming:  In  the  next 
lecture  we  shall  introduce  a more  'relaxed'  form  of  code  in  which 
each  microinstruction  contains  only  one  or  two  microorders,  which 
is  naturally  called  vertical  control. 

The  elementary  steps  of  the  machine  execution  cycle  have 
already  been  indicated.  If  no  overlap  is  attempted  then  the 
major  components — control  memory  and  processor — are  alternately 
idle  while  the  othei  completes  its  task  (remember  that  read-only 
memories,  and  even  writable  semiconductor  memories,  may  require 
very  little  time  to  recover  for  the  next  cycle).  In  order  to 
achieve  higher  performance  it  is  necessary  to  use  faster  and 
therefore  more  expensive  components,  or  to  overlap  the  elementary 
steps.  The  options  are  superficially  the  same  as  in  machine  code 
design.  The  main  differences  derive  from  the  fact  that  micro- 
programs have  been  for  the  most  part  fixed,  comparatively  small, 
and  have  made  extensive  use  of  multiway  branch  or  switch  instruc- 
tions: the  alternative  of  using  a sequence  of  tests  to  decode 
a target  instruction  would  simply  be  too  slow. 

A control  memory  address  is  frequently  composed  from  several 
fields  whose  values  are  determined  at  different  points  in  the 
machine  cycle.  The  high  order  fields  are  normally  known  first, 
so  the  construction  of  an  address  reflects  a gradual  narrowing 
down  of  the  alternatives  until  the  exact  microinstruction  can 
be  fetched. 

In  the  IBM  360/Model  30,  for  example,  a block  address  is 
found  as  part  of  the  preset  control,  not  normally  affected  by 
the  current  microinstruction;  a functional  branch  is  a field 
inserted  directly  from  the  microinstruction,  and  a switch  is  the 
low-order  two-bit  field  of  the  control  memory  address,  computed 
from  the  processor  state.  Thus,  the  successor  to  any  instruction 
is  within  the  current  block  of  256  (see  diagram)  and  may  be 
dependent  on  the  outcome  of  one  or  two  conditions  or  register 
values. 
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preset  from  processor  logic 

mlcrolns traction 


IBM  360/30  MICROINSTRUCTION 

BLOCK 

FUNCTIONAL 

SWITCH 

ADDRESS 

BRANCH 

We  can  now  see  more  clearly  when  the  overlap  of  processor  and 
control  memory  cycles  can  be  achieved.  If  the  control  address  is 
determined  by  the  processor  state  at  the  end  of  the  current  micro- 
instruction then  although  access  might  be  initiated  on  the  basis 
of  block/functional  branch  fields  the  final  decision  has  to  be 
delayed  until  the  state  of  the  processor  logic  is  known  (the 
example  given  above  falls  into  that  category) . 

If  the  control  address  is  determined  by  the  processor  state  at 
the  end  of  the  previous  instruction,  then  the  control  memory  can 
be  accessed  while  obeying  the  current  instruction,  e.g. 

TIME  N 


Previous 

Ulnst : 

yOBEY  / 

STATUS  1 

Current 

Vilnst: 

^ ACCESS 

/ ^BEY  / 

STATUS 

Next 

ylnst: 

ACCESS 

/ OBEY 

The  timing  considerations  just  described  are  shared  with  very 
much  more  sophisticated  processors:  they  result  from  any  attempt 
to  overlap  one  instruction  with  others  and  it  is  easy  to  see  that 
the  more  'changes  in  direction'  In  the  flow  of  control  the  less 
effective  are  the  overlap  arrangements.  It  is  true  to  say  that 
microprogram  is  more  afflicted  by  conditional  and  computed 
branches  than  machine  language  program,  for  which  reason  designers 
are  reluctant  to  throw  away  the  contents  of  the  micropipeline  and 
may  ask  the  coder  to  deal  with  various  'run-on'  conditions.  What 
this  means  in  practice  is  that  one  or  two  instructions  in  written 
sequence  after  a branch  may  be  obeyed,  e.g.  in  decoding  a hypo- 
thetical target  Instruction  the  microsequence  is  written: 

m^  : Extract  function  field 

m2  : Branch  to  address  4-  function 

m^  : Increment  target  instruction  counter 

Here,  although  the  branch  m2  is  taken,  the  following  microinstruc- 
tion is  still  obeyed.  It  is  in  avoiding  or  dealing  with  such 
coding  peculiarities  and  in  taking  account  of  critical  memory  or 
I-O  timing  constraints  that  microprogramming  differs  from  conven- 
tional coding,  or  has  done  so  in  the  past.  Luckily,  increasing 


hardware  power  has  removed  many  of  the  characteristics  of  micro- 
program from  modern  machines,  perhaps  the  only  positive  way  in 
which  a microprocessor  can  be  distinguished  from  a 'mini'  is  in 
its  dedication  to  the  task  of  modelling  processors  rather  than 


users  problems. 


1.3  Highway  and  Register  Organization 


The  basic  requirements  for  imitating  a given  target  instruc- 
tion set  are: 


(a) 

(b) 

(c) 
and  (d) 


arithmetic  primitives  for  composing  the  arithmetic, 
logical  and  addressing  functions  of  the  target  machine; 


memory  mapping  and  resolution  compatible  with  the  store 
structure  of  the  target  machine; 


imitation  of  the  internal  control  states,  registers  and 
register  access  requirements  of  the  target  machine; 


peripheral  Interfaces  that  reflect  the  formats,  status 
and  timing  expected  by  the  target  machine. 


Within  this  field  the  degree  of  dedication  varies  with  the 
performance/cost  objective.  Different  design  teams  have  gone 
about  the  same  task  in  quite  different  ways:  Husson  (p414)  makes 
the  point  that  although  the  IBM  360  and  RCA  Spectra  70  achieve  the 
same  architecture  the  latter  is  a much  more  ’specific'  design 
than  the  IBM  models . 


In  this  subsection  I shall  illustrate  features  of  micropro- 
cessor design  referring  to  the  IBM  360/Model  30  which  was  one  of 
the  earliest  models  of  the  IBM  360  range  and,  as  it  happens,  the 
subject  of  an  early  experiment  in  language  oriented  design  that 
I shall  refer  to  later.  Further  details  will  be  found  in  Boulaye 
(1971)  and  Weber  (1967). 


Figure  3 shows  the  data  paths  in  the  central  processor  of  the 
IBM  360/Model  30.  There  are  twelve  registers,  each  of  one  byte. 
Apart  from  the  main  memory  address  and  data  buffers  (MN  and  R)  no 
specific  allocation  of  content  is  made  by  hardware.  The  data 
paths  are  uniformly  8 bits.  The  microinstruction  is  60  bits 
long,  encoded  into  the  following  microorder  groups: 
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(i) 

Store  access: 

Fields  CM,  CN,  CU 

(ii) 

Data  flow: 

4-bit  literal  field  CK 

(iii) 

ALU  control: 

CA,  CF,  CB,  CG,  CV,  CD, 

(iv) 

Sequencing: 

CH,  CL 

(v) 

Status : 

CS 

STORE  DATA  BUS 


MAIN  & 
LOCAL 
MEMORY 


For  example,  under  group  (i) : 


CM  (3  bits)  indicates:  No  action 

Read  from  address  IJ,  UV,  or  LT  to  R 

Regenerate 

Write  from  R 

CU  (2  bits)  selects  main  or  local  (register)  storage. 

Under  group  (iii): 

CA  (4  bits)  selects  one  of  10  inputs  to  the  ALU  through  the 
A register 

CB  (2  bits)  selects  one  of  R,  L,  D or  the  literal  CKCK 

CC  (3  bits)  selects  the  actual  ALU  function 

CF  (3  bits)  modulates  the  A-input  to  ALU,  i.e.  high  digit, 
low  digit,  none,  low  or  cross-over 

CG  (2  bitsl  modulates  the  B-input  to  the  ALU 

CV  (2  bits)  selects  true,  complement  or  six-correct  form  of  B 

CZ  (4  bits)  gives  the  destination,  one  of  ten  registers. 

Thus  in  one  microinstruction,  which  takes  750nsec,  an  8-bit 
arithmetic  or  logical  operation  is  carried  out,  half  a main  store 
cycle  is  controlled,  and  the  next  microinstruction  is  selected. 

In  the  next  cycle  the  main  store  operation  must  be  completed 
while  other  operations  are  carried  out. 

If  we  consider  the  loop  of  instructions  which  Interprets  the 
target  machine  code  it  clearly  consists  of  first  fetching  the 
instruction,  then  looking  at  the  function/ format  digits  and  pre- 
paring each  operand  by  computing  an  address  and  accessing  the  store 
when  necessary,  and  then  branching  to  the  'semantic'  microsequence 
that  interprets  the  target  function.  The  instruction  will  normally 
terminate  by  servicing  interrupts  before  proceding  to  the  next  in 
sequence.  Elementary  IBM  360  instructions  take  between  15  and  30 
ysecs  in  execution,  i.e.  20-40  microinstructions:  the  large  number 
reflects  the  fact  that  any  address  or  arithmetic  calculation 
involving  operands  of  more  than  8 bits  has  to  be  carried  out 
serially  by  byte. 

In  order  to  achieve  higher  performance  the  mlcroreglsters 
and  internal  data  paths  must  be  more  closely  matched  to  those  of 
the  target  machine,  and  supplementary  functional  units  introduced 
to  minimise  the  'mismatch'  between  the  microprocessor  and  the 
target  system  architecture. 
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2.  GENERALIZED  HOST  MACHINES 


We  have  seen  some  of  the  ways  In  which  specific  features  are  1 

built  into  microprogranunable  machine  to  help  in  modelling  particu-  I 

lar  order  codes.  However,  our  main  objective  is  to  consider  sys-  1 

terns  at  a level  removed  from  machine . code , where  the  target  \ 

instruction  sets  can  to  some  extent  be  chosen  to  suit  the  available  j 

hardware:  in  the  last  lecture  we  can  attempt  to  answer  the  question  i 

of  whether  the  need  for  specific  adaptation  will  still  arise.  j 

I shall  now  discuss  design  generalisations  that  have  been  i 

favored  in  recent  years  as  the  result  of  rapid  reduction  in  the 
cost  of  storage  and  logical  devices.  In  the  latter  context 

'regularity'  of  hardware  is  at  least  as  important  as  circuit  or  j 

gate  count,  which  is  greatly  to  the  benefit  of  the  mlcroprogrammer . 

1 shall  refer  to  the  class  of  processors  under  discussion  as  host 

machines  in  order  to  suggest  their  role  and  to  avoid  undue  emphasis  i 

On  'microprogram'  or  'microprocessor'  technology.  In  practice,  , 

the  principal  use  of  host  machines  has  been  in  the  form  of  Instruc-  • 

set  emulators  (e.g.  IBM  360  Imitating  the  IBM  1401).  The  design  * 

objective  of  producing  a 'universal  emulator'  became  feasible  with 
the  Introduction  of  writable  control  memories.  It  is  clear  from 
the  outset  that  machines  capable  of  imitating  any  instruction  set 
at  competitive  speed  could  not  be  produced  at  competitive  cost, 

nevertheless  such  a machine  Is  invaluable  as  a vehicle  for  research  i 

into  computer  architectures.  The  ICL  Research  Emulator  El,  lllffe 

May  (1972),  the  Standard  Computer  Corporation  MLP-900,  Rakocsi 

(1972),  the  Stanford  University  EMMY,  Neuhauser  (1975),  and  the 

Nanodata  Corporation  QM-1,  Rosin,  et  al  (1972),  provide  examples 

of  generalised  facilities,  while  in  the  commercial  field  the 

Borroughs  Corporation  B-1700  is  particularly  interesting  from  the 

point  of  view  of  memory  allocation. 

All  the  machines  in  this  category  use  vertical  instruction 
coding  which  allows  much  greater  flexibility  in  function  sequenc- 
ing than  the  older  horizontal  designs,  and  at  the  same  time  a 
simpler  and  more  familiar  form  of  program  input.  The  reader  may 
compare  the  example  of  microprogramming  given  in  Weber  (1967)  with 
the  program  style  of  any  of  the  machines  mentioned  above,  which 
bears  comparison  with  a conventional  assembly  program  listing 
except  for  the  primitive  nature  of  the  arithmetic,  the  absence 
of  address  modification,  and  the  elaborate  field  selection  and 
branching  functions. 

In  moving  to  vertical  coding  it  is  normally  the  case  that  the 
main  memory  system  has  a much  higher  data  rate  than  the  host  needs, 
even  with  the  fastest  control  store.  The  extra  capacity  is  used 
in  direct  memory  access  by  I-O  devices,  in  dual  processor  con- 
figurations, and  in  many  Instances  by  using  the  main  memory  as  a 
source  of  microinstruction.  The  last  option  is  particularly 
attractive  because  it  affords  an  escape  from  the  rigid  limitation 
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on  microprogram  that  is  Imposed  by  a separate  control,  store.  On 
the  other  hand  it  does  impose  a control  structure  which  Is 
difficult  to  rationalise:  perhaps  the  simplest  view  is  to  look 
upon  the  interpreter  as  providing  system  standards,  operating  sys- 
tem interfaces,  protection,  etc,  which  are  not  normally  present 
at  the  microcontrol  level. 

The  following  subsections  correspond  to  the  main  design  areas 
noted  in  the  last  lecture,  with  illustrations  drawn  from  the 
machines  mentioned  above.  Further  examples  can  be  found  in  less 
readily  accessible  specifications  for  many  machines  currently  on 
the  market. 

2.1  Generalised  Arithmetic  and  Data  Paths 

One  of  the  obvious  ways  in  which  MSI  or  LSI  components  affect 
the  arithmetic  system  is  in  allowing  register  lengths  to  be 
standardised  at  a reasonably  high  value,  rather  than  making  use 
of  specialised  lengths  seen  in  earlier  machines.  The  effects  are 
to  speed  up  the  machine  and  to  save  control  memory,  because 
operations  previously  performed  by  a loop  of  microinstructions 
can  now  be  carried  out  in  one. 

The  host  is  still  specialised  with  regard  to  arithmetic  width 
and  shift  paths.  Two  methods  have  been  employed  for  variable 
precision  arithmetic  up  to  a prescribed  field  size: 

(i)  using  a third  input  to  the  ALU,  which  is  in  fact  a mask  allow- 
ing carries  to  propagate.  The  SCC  MPL-900  allows  the  micro- 
instruction to  select  one  of  32  possible  masks  which  can  be 
used  to  propagate  carry  to  the  'normal'  sign  position.  A 
mask  may  also  be  used  to  permit  operations  on  unpacked  fields 
such  as  6-bit  characters  stored  in  byte  positions.  One  of  the 
difficulties  of  working  with  unpacked  data,  however,  is  that 
it  may  eventually  have  to  be  aligned  to  an  external  interface 
such  as  the  store  address  bus. 

(11)  allow  the  effective  ALU  width  to  be  variable,  l.c.  taking 
sign,  carry  and  zero-test  signals  from  any  position  of  the 
ALU.  This  method  is  used  in  the  El  emulator  and  the  B-1700, 
where  the  sign  is  part  of  preset  control.  If  there  are  more 
than  one  arithmetic  widths  in  use  concurrently  it  is  desirable 
to  have  more  than  one  preset  sign  position,  selected  by  micro- 
instruction. 

Variation  in  ALU  width  has  an  obvious  counterpart  in  shift 
functions.  To  reproduce  exactly  the  shift  patterns  of  a word  of 
arbitrary  length  it  is  necessary  to  preset  the  point  at  which  end 
connections  are  made,  which  is  more  difficult  to  engineer  than 
sign  adjustment  because  a stream  of  hits  is  being  handled.  The 
El  emulator  does  allow  shift  lengths  from  one  to  64  bits,  but  the 
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logic.  Is  expensive  and  most  designers  have  settled  for  single  or 
double  length  shifts  and  rotations.  For  high  level  language 
Interpretation  that  Is  probably  sufficient. 
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A final  area  where  both  the  ALU  and  shifter  are  affected  is  in 
the  type  of  arithmetic  carried  out.  The  predominant  types  are 
binary  Integer » decimal,  and  floating  point.  Generalised 
facilities  for  the  last  are  usually  complex  and  of  limited  value 
in  either  the  commercial  or  research  context.  Decimal  facilities 
can  be  built  into  the  ALU  In  varying  degrees,  from  fully  signed 
operations  down  to  facilities  for  detecting  carries  at  the  decimal 
digit  positions.  The  choice  rests  entirely  on  the  final  cost/ 
performance  required.  Although  an  Important  area  of  design  It  can 
be  'factored  out’  In  comparative  studies  of  language-oriented  and 
fixed  instructions  set  machines,  for  which  reason  I shall  not 
extend  the  discussion  at  this  point.  It  Is  Important  to  remember 
that  If  a host  has  good  arithmetic  facilities  then  any  lapse  in 
handling  the  control  or  data  access  side  of  a language  will  be 
conspicuous,  and  conversely. 

If  the  path  from  memory  Is  not  selective  enough  (and  It  usually 
is  not)  facilities  are  required  for  extracting  fields  from  micro- 
registers  for  Input  to  the  ALU.  Such  facilities  are  expensive  and 
may  be  confined  to  limited  field  selection  or  to  particular 
registers  (e.g.  in  the  shift  unit).  Thus,  the  B-1700  provides  full 
extraction  on  one  2A-blt  register  and  6-blt  subfield  addressing  on 
most  others.  The  El  emulator  can  extract  any  byte  from  the  15 
niicroregisters  for  comparison  or  control  purposes.  The  MLP-900 
can  conveniently  use  the  third  ALU  Input  to  select  fields  within 
registers.  Apart  from  the  obvious  hardware  cost  of  selecting  any 
field  in  any  register,  space  will  be  taken  to  Identify  the  field 
In  microinstructions.  It  does  not  appear  that  high  level  languages 
demand  complete  generality,  and  limitations  could  be  accepted 

simply  on  the  grounds  of  coding  efficiency.  ] 

J 

2.2  Memory  Mapping  and  Address  Translation 

The  unstructured  nature  of  machine  codes,  allowing  Instructions 
to  be  used  as  data,  and  vice-versa,  requires  a strict  correspond-  ^ 

ence  to  be  maintained  between  the  target  machine  and  Its  represen-  ; 

tatlon  in  the  host.  (There  are  exceptions:  in  mapping  the  IBM 
1401  onto  the  IBM  360  It  is  more  convenient  for  the  latter  to  use 
EBCDIC  character  codes,  converting  to  and  from  BCD  In  those 
instructions  sensitive  to  BCD  formats).  In  most  instances  the 
target  machine  word  is  'rounded  up'  when  necessary  to  fit  the 
host,  not  attempting  to  make  use  of  every  bit  In  store.  However, 
the  B-1700  goes  to  the  length  of  resolving  memory  addresses  to  the 
bit  level  and  allowing  any  string  of  up  to  24  bits  to  be  read  or 
written,  starting  (or  finishing)  at  a given  position.  In  that 
case  lOOZ  memory  utilisation  can  always  be  achieved. 
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The  memory  word  or  part-word  is  made  available  for  analysis 
In  the  mlcroreglsters . It  Is  an  advantage  to  be  able  to  select 
from  two  or  three  potential  data  registers  in  order  to  avoid 
extr.i  'move*  microinstructions.  At  this  point  there  is  also  the 
opportunity  to  map  the  data  into  a more  easily  managed  form.  The 
'croM.-.points'  of  the  El  emulator  and  'language  boards'  of  the 
MLP-900  both  allow  the  choice  by  program  of  alternative  hardwired 
data  paths  to  and  from  memory.  They  may  be  used,  for  example,  to 
prepare  an  instruction  for  decoding,  to  align  6-bit  characters 
to  8-bit  byte  boundaries,  or  to  handle  parity  conventions  on  a 
'foreign*  data  bus.  The  diagram  shows  the  cross  point  paths  used 
by  El  to  read  ICL  1900  instructions,  which  enable  function, 
register  and  modifier  fields  to  be  accessed  without  shifting  the 


target  instruction  microregister.  The  effect  of  the  crosspoint  is 
to  save  5 or  6 steps  in  the  typical  interpretive  loop  of  25-30 
microinstructions.  It  can  be  seen  as  complementing  the  Internal 
data  selection  functions:  in  a machine  with  powerful  field 
selection  orders  crosspoints  would  be  less  Important. 

Apart  from  data,  addresses  have  to  be  matched  to  the  conven- 
tions of  the  host.  For  example,  if  the  target  machine  uses 
decimal  addressing  and  the  host  uses  binary  then  conversion  must 
take  place  before  accessing  the  store.  Similarly,  if  the  target 
machine  operates  in  virtual  program  space  then  virtual  to  real 
translation  is  called  for.  If  page  and  segment  table  accesses 
are  Implicit  in  each  memory  reference  the  address  conversion  could 
easily  exceed  the  combined  steps  of  instruction  decode  and  instruc- 
tion execution.  The  alternative  of  using  hardware  assistance — 
allowing  the  host  to  work  in  virtual  space — is  expensive  and  still 
leads  to  delay  in  memory  access.  Fortunately,  In  the  environment 
of  high  level  language  execution  it  is  possible  to  work  in  a 
virtual  address  space  but  avoid  most  of  the  overhead  of  address 
translation. 

2.3  Representing  the  Target  Machine  State 

The  primary  data  of  an  interpretive  program  are  the  registers, 
the  program  counter,  the  instruction  register,  control  flags. 
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channel  status  and  control  words  of  the  target  machine.  A 
generalised  host  would  expect  to  have  room  for  the  largest  target 
machine  state  of  Interest,  but  even  so  It  Is  unlikely  to  require 
more  than  a few  hundred  bytes  of  storage  for  that  purpose,  which 
often  Justifies  a file  of  fast  registers,  the  scratchpad  (or 
local  memory  In  IBM),  In  addition  to  the  mlcroreglsters  themselves. 

It  Is  a common  requirement  to  access  the  scratchpad  using  an 
Index  value.  For  example,  a target  machine  'register-register' 
Instruction  contains  two  Indices.  Microinstructions  do  not  admit 
the  type  of  address  calculation  found  In  machlne  Instructions  sets, 
therefore  It  Is  necessary  to  carry  out  some  preliminary  scratch- 
pad address  calculation.  That  happens  often  enough — at  least 
once  In  most  target  Instructions — to  justify  building  In  predic- 
tive Indexing  hardware,  which  works  in  the  following  way.  Certain 
mlcroreglster  fields  are  designated  (by  preset  parameters)  as 
scratchpad  Indices.  When  any  of  those  field  values  changes  a 
scratchpad  access  Is  Initiated  (relative  to  a preset  base) , so 
that  the  corresponding  scratchpad  element  is  available  for  read- 
ing or  writing  In  the  next  microinstruction  (compare  the  main 
store  address  registers  of  the  CDC  6600) . The  crosspoints  for 
the  El  emulator  are  designed  to  place  the  target  Instruction 
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register  and  modifier  digits  in  the  position  of  predictive  Indices, 
allowing  the  register  and  modifier  values  to  be  used  without  delay. 


The  primary  data  of  a high  level  language  machine  are  the 
Intermediate  results,  control  flags,  and  the  control,  stack  and 
environmental  pointers  that  allow  access  to  contextually  relevant 
data.  For  the  most  widely  used  languages  the  'state'  can  be 
mapped  Into  a register  file  quite  easily;  moreover.  Its  access 
patterns  correspond  closely  to  those  of  conventional  target 
machines,  hence  the  scratchpad  organisation  of  a 'universal 
emulator'  Is  equally  applicable  to  the  major  programming  languages. 
Whether  there  are  alternative  organisations  suited  to  a wider 
class  of  languages  is  a question  we  shall  consider  later:  It  might 
be  argued  that  a language  is  'major'  because  It  happens  to  fit 
onto  conventional  hardware,  and  that  when  that  constraint  is 
removed  more  attention  can  be  given  to  problem-oriented  languages. 
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2. A G«-‘neralised  Control  of  Peripherals 


I 

I 


At  this  point  we  must  draw  a broad  distinction  between 
emulation  of  the  non-privileged  users'  instruction  set  and  that 
of  the  operating  system.  The  latter  would  include  instructions 
for  channel  selection,  requesting  device  status  and  sending 
commands  as  well  as  receiving  and  sending  data.  It  may  also 
Include  special  addressing  modes  for  channel  control  words,  page 
and  segment  table  control,  interrupt  register  and  timer  access, 
handkeys,  displays,  fault  Indicators  and  so  on.  Full-scale 
emulation,  to  the  extent  of  running  the  target  machine's  periph- 
erals, engineering  test  programs,  channel  commands  and  operating 
systems  involves  at  least  twice  the  design  effort  of  the  non- 
privileged  instruction  set  alone  and  will  almost  certainly  Involve 
physical  adaptation  of  the  peripheral  Interfaces. 

In  the  present  context,  recognising  that  most  languages  are 
non-specific  with  regard  to  the  means  of  peripheral  control,  the 
preferred  approach  is  to  match  the  I-O  statements  to  the  host 
system  using  machine  language,  and  microcode  procedures. 


2.5  ITie  Effect  of  Large  Scale  Integration 

The  level  of  complexity  achievable  in  bipolar  LSI  devices  lias 
reached  the  point  of  presenting  complete  slices  (2  or  4 bits)  of 
control  or  arithmetic  circuitry  in  a single  package.  However, 
such  circuits  are  only  realised  in  favourable  coimnercial/teclinlcal 
situations,  l.e.  wide  applicability  and  high  functional  content 
in  relation  to  edge  connection.  Some  of  the  machine  features 
discussed  above  would  fail  on  both  counts.  On  the  other  hand,  1 
have  indicated  that  language  execution  makes  less  stringent 
demands  then  universal  emulation,  hence  the  'generality'  aimed  at 
by  device  manufacturers  may  well  provide  effective  support  for 
the  target  instruction  sets  of  Interest  in  the  context  of  high 
level  languages. 

How  much  does  generality  cost  in  terms  of  performance?  That 
is  Impossible  to  say  without  detailed  analysis  of  a range  of 
target  machines.  An  indication  can  be  given  by  comparing  the 
vertical  encoding  of  the  ICL  register-store  'ORX'  instruction  on 
the  El  emulator  with  the  horizontal  form  for  the  1904E.  In  terms 
of  microorders,  the  El  obeys  30  compared  with  14  for  the  special- 
ised host.  The  difference  is  by  sequence  control  (13:6),  function 
decode  (5:2)  and  operand  access  (10:5).  However,  the  most  start- 
ling figure  in  each  case  is  the  ratio  of  support  activity  to  'use- 
ful' function:  about  15:1.  Our  main  concern  in  designing 
language-oriented  target  machines  must  be  to  reduce  that  ratio. 
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3.  INTERPRETATION  OF  HIGH  LEVEL  LANGUAGES 


The  existence  of  readily  microprogrammed  host  machines 
naturally  gives  rise  to  speculation  about  the  likely  return  from 
bypassing  the  normal  Instruction  set.  To  do  so  succeefully  involves 
the  solution  of  a range  of  problems  concerning  definition,  security, 
expansion,  maintainability  and  so  on,  whose  solution  is  taken  for 
granted  in  conventional  systems.  Before  looking  at  the  broader 
problems  it  would  be  reassuring  to  have  some  measure  of  the  poten- 
tial advantage  of  microcoding,  which  is  the  subject  of  this  lecture. 

It  is  easy  to  find  performance  Improvements  in  the  region  of 
10:1  or  more  for  a particular  algorithm  expressed  in  microcode 
compared  with  machine  code.  In  evaluating  such  figures  It  must  be 
remembered  that  they  derive  from  three  contributing  sources: 

(1)  the  Inherent  speed  of  microcode  %dilch  Is  the  result  of  the 
simplicity  of  the  instructions  and  the  use  of  high  speed  control 
store;  (11)  occasional  advantages  of  the  microfunctions  over  the 
target  machine  functions,  especially  in  bit  manipulation  and  con- 
trol sequencing;  and  (Hi)  advantages  gained  from  bypassing  the 
architectural  framework  of  the  target  machine,  especially  its 
protection  mechanisms. 

It  would  be  meaningless  to  draw  conclusions  from  Isolated 
algorithms.  The  minimum  basis  of  comparison  is  taken  to  be  the 
combination  of  hardware  and  software  supporting  one  of  the  major 
programming  languages,  which  provides  the  syntax  and  semantics 
for  a broad  class  of  problems.  The  main  parameters  of  performance 
are  taken  to  be: 

(I)  compile  and  load  time 

(II)  execution  time 

(ill)  size  of  the  support  system 

(Iv)  object  program  size 

(v)  diagnostic  aids  in  (i)  and  (11) 

The  two  techniques  used  for  performance  comparison  are  bench- 
mark testing.  In  which  space  and  time  measures  are  obtained  for  a 
representative  sample  of  source  programs,  and  factoring,  in  which 
performance  Is  Inferred  from  Independent  measures  on  artificially 
chosen  statements.  From  the  design  point  of  view  the  second  Is 
much  more  useful,  though  except  in  the  case  of  Algol  60  there  do 
not  appear  to  be  any  widely  published  sets  of  reference  statements. 
Needless  to  say,  the  object  of  design  is  to  optimise  performance 
at  a given  system  cost  over  a prescribed  set  of  languages. 

The  weights  attached  to  the  measured  parameters  will  vary  from 
one  class  of  use  to  another  and  no  atteiq>t  will  be  made  to  deter- 
mine them  here.  The  aim  is  to  show  how  variations  in  processor 
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fiinct  ion — specifically  those,  brought  about  by  microprogramniing — 
affect  the  parameters  (i)  - (iv).  At  the  same  time  the  qualita- 
tive effect  of  diagnostic  aids  will  be  assessed.  Tr  will  be  seen 
that  the  time  measures  depend  partly  on  performance  of  a second 
language  which  will  be  referred  to  as  the  system  implementation 
language  (SIL) , so  whether  the  machine  is  good  at  compiling 
Fortran,  say,  depends  on  what  it  has  to  do  to  produce  executable 
code,  and  how  well  it  does  It:  as  far  as  possible  the  second  fac- 
tor will  be  Isolated  by  measuring  the  overall  performance  of  run 
time  support  modules.  Which  applies  also  to  execution  of  the  func- 
tions of  the  language  by  stored  microprogram  or  hardware  bec.nuse 
that  does  not  usually  vary  from  one  language  implementation  to 
another  and  it  can  be  measured  in  basic  arithmetic  speeds.  It 
would  be  relevant,  however,  if  one  implementation  chose  to  use  a 
decimal  radix,  while  another  implementation  of  the  same  language 
on  the  same  machine  used  binary.  Most  of  the  language  implemen- 
tations reported  in  the  literature  have  been  rendered  useless  from 
the  design  point  of  view  by  not  keeping  the  executive  algorithms 
constant:  in  other  words,  if  a performance  gain  P Is  generated 
it  is  impossible  to  tell  how  much  of  ^ derived  from  the  interpre- 
tive technique  and  how  much  from  improved  arithmetic  or  run-time 
support . 

i The  following  subsections  make  a broad  distinction  between 

r procedure  coding,  illustrated  by  some  of  the  scientific  languages, 

and  data  access,  which  is  examined  in  the  context  provided  by 
Cobol . 

3 . I Algol,  Euler  and  Expression  Evaluation 

Factored  measurements  of  Algol  performance  are  reported  by 
Wichman  (1973).  In  Table  1 I have  abstracted  some  figures  for 
machines  with  roughly  comparable  arithmetic  times.  It  Is  well 
known  that  the  Burroughts  B-6700  uses  a target  instruction  set 
tailored  to  the  representation  of  Algol:  its  effect  can  be  seen 
in  the  times  for  procedure  entry.  One  would  also  expect  it  to  be 
effective  in  array  assignement,  but  in  this  particular  case  the 
compilers  spot  the  indices  [l,l]  etc  and  generate  optimised  code 
for  the  conventional  machines.  The  advantage  of  the  language- 
oriented  code  is  to  simplify  the  compiler  rather  than  speed  up 
execut ion. 

The  importance  of  individual  statement  times  depends  on  the 
weights  attached  to  them  in  the  final  performance  measure.  In 
general,  arithmetic  and  array  access  operations  have  the  highest 
weights,  procedure  entry  is  an  order  of  magnitude  less  Important, 
and  array  declarations  an  order  of  magnitude  less  than  that.  It 
must  be  remembered  that  experimentally  observed  times  reflect  a 
complex  combination  of  hardware,  software  and  support  system. 
Implicit  in  many  decisions  is  the  designers'  assessment  of 
different  language  features,  and  his  budget  reflects  an  assessment 
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of  the  Importance  of  the  language  as  a whole. 


TABLE  1 


SOME  ALGOL  STATEMENT  EXECUTION  TIMES 


Statement 

Execution  time  in  microseconds 

B-6700 

IBM  370/165 

Univac  1108 

X :=•  1.0 

5.5 

1.4 

1.5 

X :»  1 

2.7 

1.9 

1.5 

X :=■  y 

3.9 

1.4 

1.5 

X :-  y + 2 

5.5 

1.4 

3.4 

X :■  y * z 

11.3 

1.4 

4.0 

elFl]  :=  1 

5.3 

1.6 

2.7 

e2[l,l]  :=  1 

7.7 

1.7 

5.8 

e3[l,l,l]  :-  1 

11.3 

1.7 

9.0 

begin  array  a[l:500];end  408. 

242. 

918. 

pl(x) 

28.6 

60.7 

127. 

p2(x,y) 

30.5 

83.6 

137. 

[Note:  The  times  for 

the  IBM  370  probably  err  on 

the  low  side 

because  of  the 

effect  of 

the  cache] 

In  comparing  object  code  size,  Wlchman  gives  the  following 
figures  normalised  with  respect  to  Atlas: 

Burroughs  B-5500  0.16 

Univac  1108  0.31 

CDC-6600  0.56 

The  advantage  of  the  Algol-oriented  intermediate  form  in  compari- 
son with  some  of  the  best  conventional  systems  is  evident.  To 
understand  how  such  results  are  obtained  we  must  examine  some 
target  machine  states  and  the  functions  applied  to  them. 

The  advantage  of  language-oriented  intermediate  code  is  that, 
provided  an  'expression-evaluation*  mechanism  is  built  in  to  the 
Interpreter,  the  details  of  register  transfers  that  are  usually 
found  in  machine  code  can  be  omitted.  The  compiler  is  simplified 
the  code  is  more  compact.  It  is  not  inherently  faster,  because 
the  data  access  is  Indirect,  but  in  many  instances  that  is  more 
than  compensated  by  savings  in  other  parts  of  microprogram.  The 
stack  mechanism  is  the  best  known  means  of  expression  evaluation: 
the  reader  is  no  doubt  familiar  with  the  reverse  polish  form  of 
code  used  in  Burroughts  B6700  and  other  mechines  and  the  various 
stack  and  environmental  (display)  pointers  associated  with  it. 
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However,  the  apparent  simplicity  of  the  Burroughts  representa- 
tion leads  to  some  complexity  in  the  machine  functions  themselves. 
The  value  call  operator  (VALC)  has  to  be  able  to  detect  and 
interpret  all  the  operand  types  that  can  legitimately  be  presented 
In  the  course  of  computation.  Including  Indirect  references 
through  the  stack  and  procedural  definitions  arising  in  parameter 
lists.  In  most  applications  the  questions  answered  by  examining 
tags  could  be  answered  in  advance  by  the  compiler:  as  a general 
rule  unnecessary  tests  at  execution  time  should  be  avoided  except 
as  deliberate  backup  for  the  compiler,  the  support  system  or  data 
security. 

In  contrast,  dynamic  tag  testing  is  essential  to  languages 
such  as  Euler  and  APL  because  the  type  of  a variable  is  not  pre- 
dictable at  compile  time.  Let  us  examine  the  Euler  representation 
in  greater  detail  and  see  how  one  of  the  target  machine  syllables 
fits  onto  the  architecture  of  the  IBM  360/Model  30  described  in 
the  first  lecture  (for  greater  detail,  see  Weber  (1967)). 

The  representation  of  a variable  is  a [tag, value]  pair,  the 
tags  having  the  following  significance: 

0 Null  5 Reference  (m,loc) 

1 Integer  6 Procedure  (m,  link) 

2 Real  7 List  (length,  loc) 

3 Boolean  8 ("nassigned) 

A Label  (mp,  pa)  9 Block  mark  (in  stack) 

The  run-tine  environment  consists  of  three  storage  areas:  Program, 
which  is  Indexed  by  pa  (program  address)  and  link  (return  address); 
Variable,  indexed  by  loc  (location),  where  all  defined  data  is  to 
be  found,  and  the  Stack,  which  consists  simply  of  block  marks 
giving  static  and  dynamic  chain  links,  references  to  parameters 
in  the  Variable  space,  and  intermediate  results.  Operators  exist 
to  test  the  tag  of  a variable,  e.g. 

isn  A Is  A an  integer? 

returns  the  boolean  value  true  or  false.  Standard  operators  such 
as  + - * / mod  max  abs  can  be  applied  to  numeric  values,  yielding 
numeric  results,  and  falling  if  Illegal  tags  are  encountered. 

A list  is  an  ordered  set  of  values,  each  of  which  is  either  an 
elementary  type  or  a list.  Lists  can  be  created  dynamically,  and 
operators  exist  for  enquiring  the  length,  detaching  the  tail, 
selecting  an  element  and  concatenating  two  lists.  The  existence 
of  reference  variables  causes  the  variable  space  to  be  maintained 
by  scanning  pointers  and  recovering  space  which  is  no  longer 
referenced,  updating  pointers  when  compacting  the  active  store 
areas. 
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The  Euler  program  area  consists  of  sequences  of  operator 
syllables  (bytes),  each  followed  by  the  appropriate  number  of 
bytes  giving  literal  values  or  Indices.  The  program  Is  represent- 
ed In  reverse  Polish  form,  e.g.  the  statement: 

* If  v<  n or  t ■ 0 then  d else  e' 
would  be  represented  by  the  following  string  of  27  bytes: 


(bn  dlsp 
load  0v) 


bn  dj 

I 

[load  @n) 


test  true?  Y:d  N:(load  @t) 


vale 

(n) 

lit  0 

0 0 

load 

zero 

I EQj  I or  pa(d)  then  pa(e) 

test  true?  Y:d  N:  goto  e 

Note  that  the  0 operator  forms  a reference  on  the  stack,  which 
vale  converts  to  the  corresponding  value.  The  translation  Is 
thus  a simple  reordering  of  the  Input  string,  replacing  variables 
by  [block  number,  displacement^  pairs.  The  latter  are  converted 
Into  [mark  number,  loc]  pairs  on  loading  to  the  stack.  In  the 
program  the  logical  connectives  give  a destination  to  which  con- 
trol passes  If  the  top  of  stack  element  has  the  required  value. 
Figure  4 gives  the  microcode  for  the  and,  or  and  then  operators. 

A Boolean  variable  has  the  binary  form  'OOllOOOy',  i.e.  tag  3 
and  value  y 1 for  true.  The  mlcroreglsters  IJ  are  used  as 
program  counter,  UV  points  to  the  top  of  stack.  For  simplicity, 
the  address  incrementing  microorders,  which  are  really  byte- 
serial,  have  been  written  as  'IJ  + 1*  etc. 

The  sample  microsequence  checks  the  tag  of  the  operand  and 
Interprets  the  logical  connective  In  8 microinstructions,  4 main 
memory  cycles,  or  6 ysec  (7.5  if  false).  The  corresponding  IBM 
360  target  Instructions  would  take  the  form: 

CLI  O(STACK),  LOOT 

BE  ORTRUE 

CLI  O(STACK),  LOGF 

BNE  TYPERROR 

SH  STACK,  ='4' 

The  interpretation  of  that  sequence  takes  32  psec  If  'true',  90 
(jsec  If  'false'.  It  occupies  24  bytes  of  program  as  opposed  to 
3.  That  puts  microprogram  Interpretation  In  Its  most  favorable 
light:  dynamic  type  assignment,  minimal  arithmetic  content  and 
n.itve  compiling  techniques.  It  Is  easy  to  see  that  even  with 
dynamic  type  assignment  it  is  often  possible  for  the  compiler  to 


21 


CYCLE: 


READ  MAIN 


MN  ^ IJ 

IJ-«—  IJ  1 /*  FETCH  INSTRUCTION  */ 


G <-  R 

BRANCH  ON 

1 

WRITE  MAIM 

RqRi 

(00) 

(01) 

(lb) 

(11) 

MM  «-  UV 

READ  MAIM 

BRANCH  ON 

1 

G2G3  /*  FETCH  TOP  OF  STACK 

V 

(00) 

(01) 

(lb) 

(ID 

R*-  R + #D0  WRITE  MAIN 

HZ,  LZ  SET 

BRANCH  ON 

G4GC. 

(00) 

(01) 

(10) 

(11) 

MN<-  IJ 

READ  MAIN 

IJ  'fr-  IJ  + I 

/*  DESTINATION 

r 

BRANCH  ON  G 

(.^5 

(ool 

(01) 

(10) 

Itl) 

true)  : 

l<-  R 

WRITE  MAIM 

TYPE  TEST  S 

^ /*  GO  TO  ERROR  IF  FALSE 

V 

MM+-  IJ 

READ  MAIN 

/*  DESTINATION  */ 

J4-R  WRITE  MAIN 


GO  TO  CYCLE 


Figure  A:  Microcode  for  Euler  Logical  Connectives 
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predict  the  result  of  an  operation  as  far  as  type  is  concerned, 
and  to  onjit  further  checks,  as  in: 


j if  X = y 

j which  must  give  a Boolean  on  top  of  the  stack. 

I The  advantage  in  space  which  results  from  the  syllabic  form  of 

I target  Instruction  is  a combination  of  two  effects:  the  localisa- 

; tion  of  the  operator/operand  space  implied  by  the  source  language, 

: and  the  use  of  working  registers  implied  by  the  stack.  It  would 

: be  possible  to  compress  an  operand 'address'  to  3 or  A bits,  for 

example,  provided  changes  of  'context',  in  which  the  full  meaning 
of  the  operand  is  expanded,  can  be  effected  without  excessive 
overhead.  Unfortunately,  very  little  is  known  about  the  conse- 
quences of  one  choice  or  another;  it  is  not  even  clear  that  pro- 
cedure boundaries  should  play  a part  in  defining  context.  The  use 
of  a stack  mechansim  may  not  be  optimal:  we  can  see  that  some 
run-time  maintenance  activity  is  involved  of  which  a compiler  could 
avoid,  and  it  is  known  that  the  majority  of  expressions  found  in 
practice  are  of  very  simple  forms  which  do  not  require  the  full 
generality  of  stack  evaluation.  Hoevel  and  Flynn  (1977)  suggest 
an  alternative  primitive  form  of  instruction  which  recognises  many 
important  special  cases.  Space  gains  of  up  to  5:1  for  Fortran 
compared  with  IBM  System  370  optimising  compiler  are  reported. 

3.2  Cobol  Interpretation 

. The  major  parts  of  a Cobol  program  are  the  Data  and  Procedure 

Divisions.  The  program  operates  on  files  of  records  and  uses 
Internal  records  for  workspace.  Each  possible  record  format  is 
declared  in  the  Data  Division:  the  same  physical  record  may  be 
mapped  according  to  many  different  declarations,  so  there  is  no 
question  of  concealing  representations  or  placing  descriptive  tags 
as  parts  of  the  record.  The  elementary  items  of  data  have  a wide 
variety  of  representations  with  a dozen  or  so  basic  data  types. 

The  elementary  items  are  named,  and  may  be  collected  into  named 
groups,  which  in  turn  may  be  grouped,  up  to  the  level  of  the 
record  name  Itself.  With  the  aid  of  PICTURE  descriptions  editing 
' characters  can  be  inserted  in  a field  for  output  (and  conversely 

for  input)  with  the  result  that  the  'type'  code  associated  with  a 
data  item  can  be  of  almost  any  length. 

Within  a record  individual  items  or  groups  of  items  may  be 
i repeated.  The  number  of  actual  occurrences  may  vary,  depending 

on  a field  in  a fixed  position  in  the  same  record.  Repeated  items 
are  selected  by  following  the  repeated  group  or  field  name  in  the 
Procedure  Division  by  one  or  more  subscripts,  or  by  using  an 
implied  Index  value.  The  coefficients  of  the  associated  storage 
mapping  function  can  be  determined  by  the  compiler. 


23 


i'lu*  PiKceduri'  Division  is  composed  of  a number  of  Segmc*nts, 
whose  s ip.niticance  derives  from  the  days  of  programmed  overlays. 

A Stiiinent  comprises  a number  of  labelled  paragraphs,  each  contain- 
ing, one  or  more  sentences.  A sentence  consists  of  one  or  more 
Cobol  statements. 


Individual  statements  have  a fairly  simple  syntax,  a verb 
followed  by  data  names  and  Segment  or  paragraph  names,  e.g. 

ADD  I*  TO  Q GIVING  DAY  TOTAL  ROUNDED 

wlKire  P,  Q and  DAY_TOTAL  are  dat-i  names.  The  definition  of  Cobol 
implies  strict  observation  of  decimal  rounding  and  truncation  and 
is  subject  to  the  types  of  operands  and  the  size  of  intermediate 
results  (18  digits).  The  compiler  Is  required  to  indicate  if 
ovierands  are  incompatible,  or  if  intermediate  results  are  nut  of 
range.  Some  Indication  of  verb  trei|uencies  is  given  by  the 
following  measures  from  a benchmark  test: 


VERB 

DYNAMIC 

STATIC 

USAGE 

USAGE 

MOVE 

30% 

33% 

IF 

30% 

18% 

GOTO 

11% 

19% 

ADD 

10% 

6% 

PERFORM 

7% 

8% 

WRITE 

4% 

3% 

READ 

3% 

2% 

Others 

5% 

11% 

ion  purposes 

seven  verbs  account 

for  95: 

I'hus  for  exe 

statements,  while  the  same  seven  account  for  almost  90%  of  stored 
statements.  The  target  code  can  be  chosen  purely  as  a compromise 
between  compiler  and  microcode,  without  concern  for  reconstructing 
the  source  string  (which  affects  APL  coding  for  example).  The 
final  form  depends  on  what  are  regarded  as  reasonable  limits  for 
field  sizes  in  one  Cobol  source  module.  In  the  target  instruction 
listed  in  Table  2 the  maxima  are  taken  to  be: 


Variables;  4096  ; Indices:  256  ; Files:  256  ; Data  areas:  64 

Procedure  variables:  256. 


In  the  design  used  here,  which  is  based  on  a Cobol  Interpreter 
written  for  the  ICL  El  emulator,  each  Cobol  statement  is  represent^ 
ed  by  a sequence  of  16-blt  target  Instructions. 
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TABLE  2 


A COBOL  TARGET  INSTRUCTION  LANGUAGE 


4 12 

Format 

#1 

£ n 

f=0: 

Source  operand  at  DQT[n] 

£-1: 

Destination  at  DOT[n] 

f=2: 

Operand  at  DQT[n] 

f-3: 

Operand  n 

f-6: 

Branch  within  code  area,  offset  n 

4 4 8 

Format 

S2 

f V n 

f-7: 

n-byte  literal  operand,  type  v 

f-8: 

Scale  operand,  partial  result,...,  by  n 

f-9: 

Arithmetic;  scale  first  operand  by  n 

v[ADD,  SUBTRACT,  SUBTRACT-GIVING,  MULTIPLY, 

DIVIDE,  DIVIDE-REMAINDER,  . . . , etc] 

f-lO: 

Branch  DEPENDING,  via  Procedure  variable n 

f-11: 

Branch n,  depending  on  condition  v 

f-13: 

v[MOVE,  COMPARE,  SET  INDEX,  DEBUG,  STOP, 

and  call  RUNTIME  support] 

RUNTIME:  ACCEPT  TIME,  DATE,  DAY,  DISPLAY, 

OPEN,  CLOSE,  READ,  WRITE,  REWRITE,  START,  DELETE, 
CANCEL,  CALL,  EXIT,  etc. 


Cobol  control  structure  is  the  source  of  some  complexity  be- 
cause of  the  use  of  procedure  variables  and  debugging  options. 
Apart  from  the  normal  branching  determined  by  GOTO  statements  It 
is  possible  to  specify  that  a particular  paragraph  cr  sequence  of 
paragraphs  should  be  PERFORMed  one  or  more  times,  or  until  a 
condition  Is  satisfied  (possibly  varying  some  elements  on  each 
repetition).  A simple  compiler  cannot  tell  In  advance  which 
paragraphs  will  be  the  subject  of  PERFORM,  so  it  will  Insert  a 
possible  branch  to  a 'procedure  variable*  at  the  end  of  each 
paragraph:  If  PERFORM  does  not  apply,  the  branch  'drops  through' 
to  the  next  paragraph  In  sequence.  Further  complication  derives 
from  the  ALTER  verb,  which  can  be  used  to  change  the  destination 
of  a GOTO.  Rather  than  change  the  stored  object  code  the  branch 
is  again  directed  through  the  procedure  variable  table. 

The  complication  arising  from  debugging  Is  that  any  attempt 
to  access  a named  data  item,  paragraph,  file  or  Index  may  be 
required  to  enter  a debug  procedure.  In  most  compilers  that  means 
that  the  code  generated  for  handling  debugged  elements  is  differ- 
ent from  (and  slower  than)  normal  code,  even  when  executing  with 
DEBUG  OFF.  In  interpretive  systems  the  same  target  code  is 
generated  In  all  cases  and  the  branch  is  taken  In  the  Interpreter. 
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In  the  Data  Division  all  names  are  mapped  unarablBiiously  into 
indices  in  the  lists  of  data  qualifiers  (DQT) , file  and  Index 
table.  Procedure  variables  are  Indexed  in  the  Procedure  Division. 
Infc'rmation  built  up  during  the  compilation  phase  can  be  carried 
over  into  execution  without  change  in  many  cases.  Figure  5 shi>ws 
the  nodular  structure  of  Cobol  as  far  as  it  affects  the  interpre- 
ter. Tiie  DQT  contains  a 64-blt  descriptor  for  each  variable, 
giving: 

. tiie  index  of  the  base  pointer  for  the  record  currently 
containing  tiie  variable 

. offset  and  limit  of  the  variable  within  the  record  area 
. wlietlier  tiie  debug  option  applies 
. operand  type  and  scaling  information 

. it  suhs(M- Ipted . tiie  index  of  mapping  parameters  in  the 
subscript  information  table 

. if  edited,  the  index  of  editing  parameters  in  the  edit 
in format  ion  table. 

At  runtime  tiie  data  qualifier  element  DQI'fn]  is  interpreted  t<> 
give  the  address  pointer  to  a se<juence  of  bytes  (or  bits)  wittiin 
the  area  defined  by  tiie  base.  About  20  inLcrosteps  are  required  to 
ext'^act  tlu-  data  attriluites  and  place  them  in  mlcrorcglsters , 
followed  by  wliatever  is  needed  to  extract  the  data  Itself  and 
present  it  ftir  the  next  operation.  Hence  the  management  of  the 
DQT  represeiit.s  a significant  part  of  the  interpretive  overhead. 

In  measuring  Cobol  perCorm.Tiice  the  time  and  space  requirements 
of  a set  of  test  statements  were  measured,  and  final  figures  of 
nurit  obtained  by  weighting  the  results  according  to  dynamic  or 
static  usage.  For  space,  a gain  of  1:3  resulted  In  comparison 
with  the  ICL  1900  program  requirements.  It  appeared  possible  to 
improve  on  that  by  adding  to  the  function  set.  For  time,  an  over- 
all Improvement  of  1:2.5  was  observed  In  comparison  with  the 
vonvent ional  compiler  on  the  ICL  1900.  That  figure  is  disappoint- 
ing. It  is  accounted  for  In  part  by  the  arithmetic  complexity  of 
Cobol.  Nevertheless  the  average  Cobol  statement  appears  to  need 
about  200  microsteps  (as  opposed  to  500),  and  In  several  instances 
the  I'onvent ional  compiler  generates  code  that  runs  faster  than  the 
interpreter,  for  much  the  same  rt'ason  as  we  saw  earlier  in  looking 
at  Algol  implementations.  However,  another  factor  proves  to  bo 
significant:  the  time  spent  in  the  Interface  between  the  language 
interpreter  and  the  supporting  STL. 
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A.  INTERPRETIVE  SYSTEM  DESIGN 

Improving  on  the  range-defined  instruction  sets  of  fifteen 
years  ago  without  meeting  comparable  system  objectives  is  not 
particularly  difficult.  To  present  a realistic  alternative  it 
must  be  shown  how  programming  standards  can  be  maintained  through 
a very  wide  power  range;  it  must  be  possible  to  develop  and  main- 
tain new  languages  and  subsystems  taking  full  advantage  of  the 
architecture  without  endangering  system  security;  storage  and  con- 
trol structures  must  be  created  to  suit  modern  applications  rather 
than  those  of  the  early  1960's.  As  far  as  I know,  no  'microsystem' 
has  been  developed  with  the  required  properties.  Even  so.  It  is 
not  Sufficient  to  show  that  variable  microcode  achieves  better 
results  than  fixed  instruction  sets:  we  also  need  to  be  con- 
vinced that  it  is  the  best  way  of  using  modem  technology.  In 
this  lecture  1 shall  draw  together  some  of  the  results  observed  in 
language-oriented  machine  design  and  suggest  two  alternative 
system  frameworks  in  which  Che  demonstrated  advantages  could  be 
retained. 

4.1.  The  Effect  on  Language  Parameters 


As  I have  already  indicated,  many  of  the  measures  of  language 
performance  are  affected  strongly  by  the  choice  of  supporting 
system,  which  we  suppose  to  be  reflected  in  the  semantics  of  the 
System  Implementation  Language  (SIL) . For  example,  suppose  the 
SIL  is  in  fact  a copy  of  the  Executive  package  of  a conventional 
machine  range,  and  Chat  a Cobol  application  package  is  obeyed 
(a)  using  the  fixed  instruction  set  and  (b)  using  a Cobol  target 
code  such  as  discussed  in  the  last  lecture.  Then  the  observable 
effect  on  storage  requirements  would  be  as  follows  (using  typical 
figures  for  the  ICL  1900)  : . . ... 


i 


Fixed  Instr. 

Flxed+Cobol 

Fixed  Instr.  ycode 

16  Kbyte 

16  Kbyte 

Cobol  target  pcode 

0 

9 Kbyte 

Executive  (kernel)  functions: 
System  functions  (spooling. 

16  Kbyte 

16  Kbyte 

command  language,  etc) 

20  Kbyte 

20  Kbyte 

Cobol  run-time  support: 

25  Kbyte 

25  Kbyte 

Cobol  application  - data  (say) 

9 Kbyte 

9 Kbyte 

- code  (say) 

9 Kbyte 

3 Kbyte 

Total 

95  Kbyte 

98  Kbyte 

In  other  words,  the  reward  for  a great  deal  of  effort  and  invest- 
ment in  control  memory  is  negligible  as  far  as  storage  is  concerned. 
Of  course,  one  can  present  the  picture  in  other  ways  and  use  the 
speed  gain  to  advantage  if  there  is  sufficient  I-O  capacity,  but 
the  point  remains  Chat  unless  the  support  system  gains  similar 
advantages  from  the  interpretive  techniques  the  improvement  in 
language  performance  will  be  seriously  diluted.  Let  us  assume. 
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chorefore,  that  the  Sll,  Itself  heuefits  from  the-  use  of  mlcro- 
l>i'ograni.  The  effect  may  be  seen  as  space  reduction  and  a gain  in 
•spt-ed;  more  pn>hably  It  will  he  seen  as  improvement  in  function 
and  flexibility.  In  reviewing  th«‘  parameters  listed  earlier 
SOPH*  of  the  requirements  of  the  SIL  will  be  noted. 

(i)  Compile  and  Load  Time. 

Substantial  (say  a factor  of  5)  gains  in  speed  can  be  made  in 
the  portions  of  a compiler  concerned  with  lexical  and  syntax 
analysis,  and  to  a lesser  extent  in  code  generation,  by  microcode 
interpretation  of  syntax  tables.  Where  In-line  coding  has  been 
used  in  the  past  the  speed  gain  is  smaller  but  significant  saving 
in  space  is  achieved  by  table-driven  techniques.  Compile  time  is 
indirectly  affected  by  the  choice  of  object  code  under  (ii). 

Load  time  is  normally  determined  by  the  supporting  system. 

If  all  programs  have  to  be  mapped  into  a (virtual  or  real)  linear 
store  the  time  and  space  overheads  in  starting  a job  step  may  be 
significant  (comparable  with  the  compiler  Itself  in  many  c«>nven- 
tional  systems).  Moreover,  the  operating  inconvenience  is 
significant  and  may  result  in  such  anomalies  as  separate  'batch' 
and  ' load-and-go'  language  systems.  There  is  no  reason,  however, 
why  the  SIL  functions  should  not  allow  program  execution  with 
explicit  structure.  For  example,  the  operating  environment  shown 
In  Figure  3 can  be  maintained  with  no  appreciable  execution  over- 
head on  tfie  part  of  the  SIL.  In  that  case,  the  load  time  is 
negligible . 

(il)  Execution  Time 

Excluding  arithmetic  and  1-0,  execution  time  is  governed  by 
the  time  of  access  to  variables  and  the  change  of  control  environ- 
ments, l.e.  the  subsets  of  the  program  space  immediately  available 
from  particular  points  In  the  program.  It  Is  the  'localisation' 
of  the  environment  which  allows  short  addresses  to  be  used  and 
produces  the  greatest  contribution  to  code  compaction.  The  dia- 
gram shows  the  components  of  a generalised  access  chain.  Data 
elements  are  assumed  to  be  created  in  blocks  (activation  records 
or  file  areas)  which  are  not  necessarily  contiguous  in  store,  but 
selectable  by  an  Index  n.  Data  identifiers  in  the  source  text 
art  mapped  into  indices  m,  which  are  used  to  refer  to  a table  of 
attributes  (cf  the  DQT  in  Cobol)  which  give  record  pointer,  off- 
set, size,  type,  and  possibly  other  information  derived  by  the 
compiler  and  required  during  execution.  In  general,  several  sets 
of  attributes  may  refer  to  the  same  record,  and  one  set  of 
attributes  can  refer  to  several  record  areas  (through  dynamic 
adjustment  of  the  control  environment). 
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OBJECT  CODE  ATTRIBUTES  CONTROL  DATA 

ENVIRONMENT  STORAGE 

(static  dynamic) 


OBJECT  CODE  ATTRIBUTES  CONTROL  DATA 

ENVIRONMENT  STORAGE 

(static  dynamic) 


Languages  differ  in  the  amount  of  attribute  Information 
carried  Into  the  execution  phase,  the  method  of  changing  the  con- 
trol environment,  the  time  at  which  attributes  are  assigned,  and 
hence  In  the  ways  of  distributing  components  of  the  access  chain 
In  storage.  In  Fortran,  for  example,  attributes  and  record 
pointers  can  be  absorbed  Into  the  object  code;  In  APL  the  object 
code  and  attributes  are  dynamically  assigned;  In  Algol  the  (g,n) 
pair  and  size  can  be  absorbed  Into  the  object  code  while  the  type 
Is  sometimes  attached  to  the  data  In  the  form  of  a tag.  Where 
explicit  maintenance  of  attribute  and  environment  Is  demanded  by 

the  language  there  can  be  significant  gains  from  using  microcode.  H 

The  ratio  of  addressing  and  control  Instructions  to  arithmetic  In  | 

the  output  of  a conventional  compiler  Is  In  the  region  of  4:1,  so  l 

assuming  a 5:1  speed  increase  from  microcoding  the  former  an  over- 
all speed  gain  of  5:1.8  or  2.8:1  Is  Indicated.  One  would  expect  ' 

more  for  the  highly  structured  or  'dynamic'  languages.  Further 
speed  gains  can  be  expected  where  specialised  arithmetic  functions 
are  called  for,  e.g.  array,  complex,  controlled  precision  or 

character  string  manipulation.  A minimum  overall  gain  of  3:1  in  ^ 

speed  of  a 'production'  compiler  to  range  standards  would  be  a ) 

realistic  objective  for  the  languages  In  common  use.  ; 

i 

A language  allowing  free  assignment  of  pointers  (reference 
variables)  entails  potentially  serious  support  overheads  In  the 
assignment  and  recovery  of  space,  not  necessarily  eliminated  by 
the  provision  of  a large  virtual  store.  Even  If  the  SIL  recognises 
pointers  it  seems  preferable  for  the  language  subsystem  to  under- 
take Its  own  space  management  to  take  advantage  of  known  local 
characteristics.  The  language  'pointer'  Is  evaluated  In  terms  of 
the  underlying  program  structure  at  the  time  of  use:  that  opera- 
tion occurs  frequently  and  benefits  from  processor  adaptation  to 
Che  extent  Chat  once  an  evaluation  has  been  carried  out  the  result 
can  be  used  repeatedly  on  successive  Items  of  data.  It  Is  then 
required  of  Che  SIL  to  allow  language  Interpreters  Co  work  with 
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'absolute*  as  well  as  virtual  addresses.  In  the  next  subsection 
we  shall  see  what  that  Implies.  (The  alternative  of  having  both 
the  SIL  and  the  language  microcode  work  in  a virtual  space  support 
e.d  by  hardware  can  be  disregarded  because  of  the  delay  in  access- 
ing memory  and  the  poor  store  utilization  that  results.) 

Space  miinageraent  functions  are  principally  concerned  with 
searching  for  and  updating  pointers  and  physically  moving  blocks 
of  data.  They  are  time  consuming  and  in  many  languages  their  use 
is  discouraged  by  artificial  means,  so  the  gain  from  making  them 
more  efficient  would  be  seen  in  program  flexibility  (in  the  user 
language  and  the  SIL)  rather  than  in  execution  time. 

(iii)  Size  of  Support  System 

The  SIL  code  benefits  in  two  ways:  in  many  situations,  e.g. 

In  compiling  to  language-oriented  code,  it  has  to  do  less;  and 
it  does  it  more  efficiently  than  other  high  level  system  program- 
ming languages,  or  more  elegantly  than  a macroassembler.  Size 
reductions  in  the  region  of  5:1  liave  been  achieved  for  compilers. 
Each  language  microcode  represents  a space  overhead  of  at  least 
10  Kbytes,  plus  a similar  amount  for  the  resident  SIL. 

(iv)  Object  Program  Size 

Tailoring  the  object  code  to  fit  the  source  language  shows  the 
clearest  gains  over  conventional  systems  because  of  the  elimina- 
tion of  unnecessary  function,  register  and  address  bits.  An 
overall  reduction  in  procedure  size  of  A:1  for  large  programs, 
including  attribute  tables,  would  be  a realistic  aim.  No  signi- 
ficant gains  in  data  mapping  over  a conventional  system  with  word 
and  character  addressing  can  be  expected.  Gains  in  space  can  be 
seen  as  gains  in  main  memory  and  channel  capacity  and  to  a smaller 
extent  in  file  space. 

(v)  Diagnositc  Aids. 

As  any  APL  user  discovers,  interpretive  methods  can  give 
exceptionally  good  diagnostic  information,  sufficient  to  overcome 
eccentricities  of  the  language  itself.  Unfortunately,  diagnostic 
quality  is  one  that  cannot  be  measured  and  is  often  overlooked  in 
favour  of  marginal  Improvements  in  the  others. 

4.2  Microsystem  Problems 

The  use  of  microprogram  brings  its  own  problems,  and  raises 
the  question  of  whether  the  Implied  comparison  with  machines  of 
the  mid-60's  was  the  correct  one  to  use.  In  the  system  context, 
the  obstacles  to  using  interpretive  microprogram  are  as  follows. 
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• (A)  Range  Definition 

The  microprogram  appropriate  to  a high  performance  machine  is 
I ' quite  different  from  that  of  a slower  microprocessor.  There 

I Is  also  an  absolute  speed  limitation:  a machine  executing 

I target  Instructions  at  10  MIPS  Is  obeying  microorders  at  least 

I 10  times  as  fast,  which  Is  beyond  the  power  of  vertically 

I encoded  (l.e.  easily  programmed)  host  machines. 

f (B)  Security 

Microprogram  derives  part  of  its  speed  advantage  by  Ignoring 
the  security  checks  Inherent  In  fixed  Instruction  sets.  For 
a small  amount  of  microprogram  under  control  of  the  manufac- 
turer that  Is  tolerable.  The  language  performance  figures 
obtained  In  practice  give  the  Interpreter  responsibility  for 
resources  normally  regarded  as  protected,  l.e.  absolute 
addresses.  In  which  case  the  security  of  the  system  Is  In  the 
hands  of  language  Implementors. 

(C)  Flexibility 

Microprogram  is  a static  form  of  code.  It  cannot  easily  be 
moved  In  store.  Fast  control  memories  and  scratchpads  are 
necessarily  small,  so  the  problems  of  sharing  resources 
between  Interpreters  and  scheduling  their  use  have  to  be 
\ solved. 

Of  the  above,  (B)  alone  Is  sufficient  to  prevent  widespread 
use  of  microprogram  In  commercial  systems.  Four  types  of  response 
! can  be  recognised: 

I 

' 1.  Embed  the  Microprogram  In  a Conventional  System 

t 

I We  have  already  noted  that  the  space  and  time  advantages  are 

diluted  In  the  context  of  a conventional  system,  nevertheless, 
those  that  remain  are  obtained  with  minimum  Investment  In  redesign. 
The  IBM  APL  Assist  Feature  running  under  DOS/VS,  OS/VSl  and  0S/VS2 
has  been  made  available  on  the  System/370  Models  135,  138,  145  and 
148  (Hassitt  and  Lyon  (1976)).  It  consists  of  an  additional  20 
Kbytes  of  microprogram,  resident  In  main  store,  which  Interprets 
APL  statements.  It  carries  out  virtual — real  address  translation 
according  to  the  rules  of  the  host  system,  but  returns  control  to 
the  host  to  service  interrupts  and  page  faults.  Hence,  system 
Integrity  depends  upon  correct  use  of  addresses  in  the  APL  micro- 
code. 

2.  Extend  Security  Boundaries  to  the  Microprogram  Level 

The  In-line  checks  that  can  be  used  without  Impairing  perfor- 
mance are  restricted  to  key  comparison,  lockout  on  fixed  sized 
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blol'ks  uf  store,  etc.  The  F.l  emulator  provides  write  protect  ii>n 
on  l6-wor<l  I rames  of  scratclipad,  64-word  frames  of  control  mt'niory, 
16  Kword  fiaroes  of  main  memory  and  all  I-O  multiplex  positions. 

The  main  drawback  to  such  schemes  is  their  inaccuracy  and  the 
difficulty  eticountered  in  handling  dynamically  changing  or  moving 
programs,  which  occur  quite  frequently  in  modern  systems. 

i 3.  Control  Address  Formation  in  Microcode 

An  alternative,  wliich  can  be  seen  as  a generalisation  of  tiie 
1 first  approach,  is  to  validate  addresses  when  they  are  formed, 

then  to  restrict  their  use  so  that  further  checks  are  unnecessary. 
The  SIL  is  responsible  for  forming  addresses  (from  segment  capa- 
bilities); the  language  microcode  can  modify  them  within  given 
limits  ?nd  access  the  store  dirr<-tly.  Addresses  are  distinguished 
by  tags  so  that  the  SIL  can  find  and  update  them  when  necessary, 
independent  of  the  source  language.  This  method  is  used  in  the 
Variable  Computer  System ( lliffc  and  May  (1974))  on  the  El  emulator, 
which  makes  provision  for  tag  manipulation.  For  complete  security, 
however,  specialised  hardware  support  is  necessary. 

4.  Separate  the  Language  Processors  Physically 

A special  case  of  the  second  approach,  which  Is  attractive 
because  technology  Is  available  in  the  form  of  low-cost  micro- 
programmable  machines.  The  separation  Is  conceptually  physical, 
in  the  form  of  multiple  processor-memory  pairs,  but  it  could  be 
achieved  by  time-slicing. 

From  the  general  design  viewpoint  either  of  the  last  two 
approaches  can  be  used  to  provide  a viable  system  model.  Each 
Intends  to  cover  a wide  range  of  performance  by  using  multiple 
computers.  From  3 It  can  be  seen  that  because  access  to  program 
space  is  controlled  the  SIL  and  user  programs  can  coexist  in  the 
' main  memory  and  control  store  (If  it  exists),  and  that  programs 

can  be  distributed  over  the  available  memory  space.  This 
'distributed  program'  model  is  well  suited  to  the  class  of 
applications  with  dynamically  changing  program  requirements,  or 
which  can  be  expressed  in  terms  of  cooperating  parallel  processes. 

From  4 a more  specialised  'dedicated  language*  model  is  derived. 
Each  program,  together  with  its  interpreter,  has  unrestricted  use 
of  the  local  memory  space  of  a processor-memory  pair  during 
execution,  but  It  is  rolled  in  and  out  by  the  scheduler  which  forms 
part  of  the  SIL.  The  SIL  microcode  and  system  procedures  can  be 
protected  by  holding  them  in  read-only  memory.  Access  to  shared 
data  or  to  overlays  must  be  through  some  form  of  secondary  store 
manager,  which  checks  the  rights  of  the  user  against  declared 
accessibility  of  the  data,  a relatively  slow  operation.  The 
disadvantages  of  the  dedicated- language  model  are  the  sensitivity 
of  programs  to  physical  store  si7;es,  the  amount  of  unproductive 
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traffic  between  central  (i.e.  secondary)  memory  and  language 
processors,  the  poor  utilization  of  processor  and  memory  resources 
(if  it  is  argued  that  processors  and  memory  are  give-away  items, 
why  bother  with  microprogram  at  all?).  Nevertheless,  such  a 
system  is  in  many  ways  the  easiest  to  understand,  it  Is  least 
affected  by  failure  bf  one  of  the  processor-memory  pairs,  and  it 
lends  itself  to  the  'personal  computer*  mode  of  working  in  the 
same  way  that  private  cars  lend  themselves  to  private  transport , 
however  inefficient. 

Each  model  presupposes  the  use  of  a system  implementation 
language  (SIL)  whose  aim  is  to  provide  a set  of  functions  that 
can  be  used  in  all  language  applications  to  reduce  development 
effort  and  code  duplication  at  both  micro-  and  target  machine 
levels.  In  so  doing  it  sets  standards  that  can  also  be  used  in 
the  variable  part.  There  is  no  doubt  that  certain  operations  such 
as  input -output  and  frequently  used  arithmetic  procedures  are 
properly  part  of  the  SIL.  How  far  one  can  go  depends  on  the  type 
of  system:  if  the  integrity  of  system  data  cannot  be  guaranteed 
(which  is  the  case  for  dedicated-language  models)  the  amount  of 
support  the  SIL  can  give  is  limited.  On  the  other  hand,  commit- 
ment of  the  SIL  to  support  facilities  that  are  rarely  used  compli- 
cates the  system  and  wastes  resources.  The  interesting  design 
area  is  thus  the  'fringe'  of  functions  just  inside  or  just  outside 
the  SIL,  which  I can  best  illustrate  by  reference  to  the  Variable 
Computer  System  developed  on  the  El  research  emulator  and  later 
transferred  to  another  host  machine. 

4 . 3 An  Example  of  a SIL:  The  Variable  Computer  System 

VeS  is  implemented  at  two  levels  of  control:  microprogram  and 
the  system  target  language  (VCSL)  in  which  all  compilers  and  sys- 
tem utilities  are  written.  The  VCS  procedures  can  be  called 
either  at  microcode  or  at  machine  code  level.  It  follows  that  if 
a microprogrammed  procedure  is  called  from  machine  level,  or  vice- 
versa,  some  code  must  be  obeyed  to  adapt  from  one  level  to  the 
other.  It  is  undesirable  to  impose  restrictions  at  this  point 
because  one  cannot  always  predict  whether  a procedure  will  be 
committed  to  microprogram;  the  descriminatlon  must  be  dynamic  or 
Immediately  before  task  initiation,  at  worst.  For  that  reason 
the  list  of  procedure  activations  associated  with  any  process  con- 
tains both  micro  and  machine  level  linkage  information.  Again, 
it  is  undesirable  to  impose  limits  on  the  depth  of  procedure  call, 
therefore  linkage  information  is  stacked  in  main  memory,  the  host 
machine  link  stack  having  very  limited  use. 

Procedure  activations  form  part  of  the  process  state  vector 
(PSV) , which  also  contains  VCS  registers,  environment  pointer, 
current  program  pointer  and  various  flag  bits  that  are  mapped  into 
the  host  registers.  As  calculation  proceeds  it  is  possible  that 
other  host  registers  will  be  used,  but  it  is  required  that  all 
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state  information  will  be  contained  In  the  PSV  at  points  whcMo  i 
change  of  procedure  or  process  may  occur.  In  that  way  the  VCS 
can  effect  process  in?in:'gement  without  explicit  knowledge  of  the 
language  state,  and  witli  a fair  degree  of  independence  of  the 
h.>sL  nuichine.  Similarly,  by  recognising  tagged  addreufjes  the  VCS 
can  carry  out  store  management  without  explicit  declaration  of 
the  nuipping  used  in  current  processes. 

Procedure  entry  and  exit  is  controlled  through  a dynamic  ctiain 
of  marked  links.  The  purpose  of  the  marks  is  to  distinguish  task 
Initiation,  system  call  and  user  procedure  calls,  allowing  various 
levels  of  restart  to  be  employed  and  providing  excellent  diagnostics 
at  both  control  levels. 

The  inlerprotatlon  to  be  placed  on  a program  segment  is 
Indicated  by  a control  type  assigned  to  a particular  compiler. 
Control  type  zero  Is  used  for  pure  data:  any  attempt  to  obey  it 
will  fail.  Control  type  1 Is  for  system  use,  type  2 for  VCSL 
target  code,  and  type  values  for  language  extensions,  e.g.  to 
Cobol , API.,  etc,  are  assigned  3,  4,  ...  on  a global  basis.  Tfie 
control  type  Is  examined  on  procedure  call  and  return  (In  the  case 
of  machine  level  code),  branching  to  the  appropriate  interpreter. 

It  can  bee  seen  that  the  PSV's  are  key  control  structures  th.U 
must  be  protected  if  system  security  is  tc  be  ensured.  The  mll^■-t 
efficient  and  flexible  basis  for  protection  is  a capability  .si  tn'mc 
such  as  that  of  the  Basic  Language  Machine.  Many  of  tlie  VCS 
functions  are  concerned  with  creating  and  manipulating  abstract 
system  objects  in  a consistent  way,  the  PSV's  being  the  representa- 
tion of  the  abstract  idea  of  a 'process'.  In  particular,  we  find 
functions  for: 

(i)  setting  up  operating  environments  (bases)  and  defining 
the  resources  found  in  them; 

(ii)  creating,  starting  and  stopping  processes; 

(iii)  entering  and  leaving  procedures; 
and  (iv)  controlling  access  to  resources. 

Here  a 'resource'  is  a storage  segment,  PSV,  I-O  device,  or  a set 
of  resources.  The  recursive!  nature  of  this  definition  allows  each 
base  to  be  constructed  as  a tree.  Clearly,  the  integrity  of  any 
object  depends  in  the  end  on  maintaining  the  integrity  of  its 
representation,  l.e.  the  score,  and  of  Che  procedures  that  are 
applied  to  it,  l.e.  the  activation  records  contained  in  the  PSV's 

Program  structure  is  dynamic.  A new  base  is  able  to  share  the 
Infnn-ntion  available  to  its  'parent'  at  the  time  of  its  creation, 
with  Che  effect  chat  a hierarchy  of  bases  is  set  up  with  the 
'system'  at  the  apex.  The  base  structure  is  important  in  building 


34 


language  subsystems  and  dependent  application  environments: 
Figure  5 shows  a typical  three-level  base  structure  to  which 
one  or  more  Cobol  modules  might  be  attached. 


SYSTtM 

BASE 


LANGUAr.!: 

SUBSYSTEM 

BASE 


laugua'.k 

DEVELOi'i  ll  tIT 
EASE 


USER  /'i: 


USER  //*’ 


COBOL  OBJECT  CODE 
'DATA  BUFFERS 
RECORD  AREA  POIIITERS 
DATA  QUALIFIERS 
EDIT  INFORMATION 
STORE  MAPPING 
INDEX  TABLE 
PROCEDURE  VARIABLES 
FILE  DESCRIPTORS 
INITIALISING  CODE 
DEBUG  CONTROL 


SYSTEM  MODULES 


COMPILER 
> RUNTIME  SUPPORT 


SUBSYSTEM  DEVICES 

» PROCESSES 


TEST  PROGRAMS  t DATA 


I 

1\\  I I OUTPUT 

\ \ I— J aiicao 


ENCODr;D 

COPOL 

STATE;  :iJNTS 


FROM  PROCEDURe 
DIVISION 


FROM  ])ATA 
DIVISION 


Figure  5 : VCS  Base  Hierarchy 


Resources  are  defined  by  various  types  of  capability,  found 
in  capability  segments  at  the  branch  points  of  the  program  tree. 
The  most  time-critical  VCS  functions  are  those  concerned  with 
forming  addresses  from  segment  capabilities  (codewords) , and  with 
using  them  to  access  memory.  For  system  reasons  a codeword  refers 
indirectly  to  store  via  a global  segment  table  (GST).  The  corres- 
ponding address  retains  the  GST  index  in  order  to  check  the 
accessibility  and  position  of  the  segment,  which  happens  each  time 
an  address  is  loaded  into  a register  (from  the  PSV) . The  access 
code  is  used  to  control  shared  (read-only)  access  by  several 
processes  or  unique  (update)  access  by  individuals.  All  such 
control  and  conversion  together  with  the  recycling  of  GST  Indices 
and  memory  is  exercised  by  VCS  microprogram,  which  provides  a 
good  example  of  the  application  of  microcode  to  system  problems  . 

The  'read',  'write'  and  'modify'  instructions  which  should 
strictly  speaking  be  found  on  the  VCS  function  list  are  too 
critical  to  handle  by  microsubroutine  call.  Users  are  therefore 
allowed  to  issue  them  directly  for  binary  data  and  trusted  to 
obs'irve  the  limit  and  protection  codes. 


CODEWORD 


[type]  [GST  index] 


GST[g] 


[access  control] [fbl]  fbl:  [limit] 


ADDRESS  [tag]  [type]  [GST  index][limit]  [ bl] 


absolute  or  relativised 
byte  location 


In  the  course  of  design  numerous  candidates  for  positions  in 
the  VCS  function  list  have  to  be  considered.  A fundamental  pro- 
blem in  extending  the  system  is  to  achieve  valuable  effect  with- 
out degrading  overall  performance.  Sometimes  a microcode  branch 
is  obtained  'for  free',  while  at  other  times  a new  facility  en- 
tails extra  tests  in  a critical  path.  The  available  control  store 
in  a range  of  host  machines  has  also  to  be  considered.  Options 
considered  in  that  light  are: 


(i)  selection  of  set  elements  by  key  rather  than  index 
value; 

(ii)  provision  of  paging  facilities; 

(iii)  static  chaining  in  the  procedure  activation  list; 

(Iv)  introduction  of  a third  segment  type  consisting  of  a 
set  of  tagged  elements; 
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(v)  use  of  semaphore  variables  for  interprocess  communication. 


There  are  many  possible  variations  of  the  addressing  rule  such 
as  (i)  and  (ii)  but  each  entails  a loss  of  space  or  time  that 
skilled  programmers  will  try  to  circumvent.  The  best  programming 
environment  appears  to  be  a set  of  dynamically  constructed, 
variable  sized  segments:  they  make  optimal  use  of  store  and 
their  access  overheads  are  well  understood.  It  is  left  to  sub- 
system designers  to  map  programs  efficiently  onto  the  tree  struc- 
ture, so  that  the  store  management  Implicit  in  a language  such  as 
APL  is  carried  out  in  part  by  the  language  subsystem  (which  is  ] 

aware  of  the  details  of  APL  usage)  and  in  part  by  VCS  functions  | 

which  provide  the  containers  for  the  APL  workspaces.  | 

I 

VCS  procedures  are  not  Intended  to  represent  high  level  con-  j 

trol  structures  directly,  though  they  happen  to  be  adequate  for  1 

VCSL  and  simple  languages  such  as  Fortran.  Recognition  of  static 
levels  involves  extra  work  in  procedure  management  and  a variety 
of  actions  dealing  with  special  cases  that  could  not  be  built  in- 
to a fixed  system,  so  it  is  Intended  that  such  structures  be 
mapped  by  the  language  microcode  into  simulated  control  stacks. 

It  seemed  probable  that  mapping  a display  structure  such  as  those 
found  in  Algol-derived  languages  would  benefit  from  the  ability  to 
manipulate  sets  of  addresses,  but  the  practical  Implementations 
studied  so  far  have  used  indirect  mapping  techniques,  l.e.  a new 
form  of  'pointer'  peculiar  to  the  language  is  Invented  and  mapped 
dynamically  onto  the  VCS  structures  (cf  the  Data  Qualifiers  in 
Cobol) . The  advantage  of  such  techniques  is  that  they  can  take 
account  of  language  parameters  in  the  design  of  pointers,  but  we 
noted  earlier  that  20  or  more  microsteps  may  be  taken  to  recon- 
struct the  absolute  VCS  address.  I 

Finally,  various  forms  of  semapore  signalling  were  consld-  | 

ered,  but  only  a minimal  'busy'  flag  was  Implemented  in  the  PSV.  j 

The  argument  against  greater  elaboration  is  that  the  access  j 

mechanism  of  the  Global  Segment  Table  already  provides  direct  con- 
trol over  shared  resources,  associating  the  control  variable  with  j 

the  resource  itself,  so  there  is  little  point  in  providing  more 
obscure  functions  to  the  same  end.  The  release  of  a segment  for 
rescheduling  at  the  end  of  a critical  section  is  not  automatic: 
to  force  it  at  procedure  exit,  for  example,  would  again  imply 
Intolerable  overheads,  so  an  explicit  VCS  Release  function  is 

required.  j 

The  Variable  Computer  System  provides  support  for  language-  I 

oriented  microprograms  in  easily  portable  form:  an  Investment  of 
about  8 Kbytes  of  microcode  transfers  the  VCS  functions,  VCSL  | 

support  codes,  compilers,  utilities,  etc  to  a new  host  machine.  j 

It  provides  the  type  of  support  which  is  needed  if  the  advantages  ^ 

of  microcode  are  to  be  fully  realised  for  each  language,  and  | 

although  the  function  list  could  be  Improved  in  the  light  of  I 
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experience  I think  it  is  a sound  method  of  exploiting  the  current 
generation  of  general  purpose  emulators,  acknowledging  that  system 
security  rests  on  the  correct  design  of  language  interpreters. 

4.4  Future  Developments 

Careful  choice  of  words  has  left  the  most  critical  question 
unanswered:  leaving  aside  short-term  expedients,  is  a general  pur- 
pose host  machine  with  two  levels  of  writable  control  the  best 
starting  point  for  processor  design?  I thiiik  not,  for  three 
reasons. 

Firstly,  the  arguments  that  have  been  used  are  based  on  mea- 
sures of  high  level  language  implementation,  whereas  a substantial 
part  of  Information  processing  still  lies  outside  that  well- 
defined  area.  Several  systems  of  mediocre  performance  and  limited 
applicability  have  resulted  from  the  assumption  that  a high  level 
language  or  set  of  languages  would  cover  the  field.  On  the  other 
hand  without  the  formality  of  high  level  constructs  it  is  diffi- 
cult to  see  how  to  make  use  of  writable  control  memory. 

But  even  accepting  the  limitations  of  high  level  languages  it 
can  still  be  argued  that  the  interpretive  approach  is  not  optimal 
in  many  instances  and  that  the  system  problems  outlined  earlier 
have  still  not  been  solved.  It  has  to  be  shown  that  there  is  a 
better  approach  to  language  implementation  with  the  range  and 
flexibility  of  conventional  systems.  We  begin  by  drawing  a 
distinction  between  the  inherent  coding  advantages  of  micropro- 
grammed interpretation  and  the  benefits  which  result  from  using 
fast  storage  or  ducking  behind  the  range  architecture. 

Microprogrammed  interpreters  have  improved  on  fixed,  complex 
target  Instruction  sets  to  the  extent  that  much  of  the  redundant 
information  in  the  instruction  stream  has  been  eliminated.  The 
figures  given  earlier  show  a reduction  from  500  to  200  microsteps 
for  the  average  Cobol  statement,  or  a reduction  from  15:1  to  6:1 
in  the  ratio  of  support  steps  to  useful  arithmetic  and  logic. 

That  suggests  there  is  still  room  for  improvement , which  might  be 
found  in  a hybrid  form  of  control  in  which  in-line  and  interpre- 
tive methods  can  be  mixed.  After  all,  an  interpreter  is  simply  a 
means  of  calling  a subroutine  from  the  target  Instruction  stream: 
its  weakness  is  that  the  Interpretive  overhead  is  paid  on  every 
syllable.  In  other  words,  if  we  think  in  terms  of  an  8-blt 
function  syllable,  128  codes  might  be  assigned  to  hard-wired 
functions,  the  other  128  to  procedure  entries  in  a variable 
'control  environment*. 

The  starting  point  I suggest  is  that  each  language  should  be 
analysed  from  the  point  of  view  of  minimising  the  product  of  micro- 
steps  and  space  in  the  representation  of  programs,  covering  both 
instruction  and  descriptor  decoding.  I expect,  though  I do  not 


know  of  a fully  tested  example,  that  the  best  code  a compiler  can 
produce  will  be  a mixture  of  microsteps  and  monosyllabic  procedure 
calls.  In  other  words,  the  separation  into  'interpreter'  and 
'target'  code  is  no  longer  relevent. 

The  problem  of  presenting  the  control  stream  to  the  processor 
at  high  speed  cannot  be  solved  by  committing  the  entire  Interpreter 
to  control  memory  because  it  is  now  diffused  through  the  program 
space.  As  it  happens,  it  was  not  at  all  clear  how  to  do  that  in 
a flexible  manner  for  a general  purpose  multilanguage  system.  The 
conversion  of  'microsteps'  to  'nanoseconds'  can  best  be  treated  in 
the  broader  context  of  speeding  up  memory  access  rates:  look  ahead, 
use  cache  buffers,  or  in  the  last  resort  pay  more,  but  do  not 
attempt  to  deal  specifically  with  the  restrictions  of  control 
memory  or  scratchpad.  It  will  be  noted  in  passing  that  for  the 
multicomputer  architectures  envisaged  the  path  from  memory  to 
processor  is  shorter  than  that  of  a centralised  system  with  shared 
store  highways,  therefore  the  benefit  of  high  speed  control  memory 
would  be  less  marked. 

Returning  to  system  problems,  we  are  left  with  (A)  range  cover, 
which  it  was  (and  still  is)  hoped  to  achieve  using  multiple  compu- 
ters, and  (B)  security.  The  dedicated -language  system  is  not 
affected  by  the  use  of  hybrid  control:  no  assumptions  are  made 
about  program  security.  The  distributed-program  system  does 
depend  on  controlled  address  formation,  which  was  achieved  in  the 
Variable  Computer  System  by  a policy  of  trusting  the  language 
subsystems.  With  hybrid  control  it  becomes  imperative  to  have 
hardware-enforced  protection.  It  is  also  the  case  that  many  of 
the  key  VCS  functions  at  present  implemented  by  raicrosubroutlnc 
calls  could  be  Implemented  by  in-line  code. 

The  above  discussion  has  been  based  on  vaguely  defined  'micro- 
steps' comparable  with  the  vertical  microinstructions  of  present- 
day  machines.  The  reader  may  feel  concerned  at  reverting  to  a 
processor  style  not  far  removed  from  that  of  twenty  years  ago.  Is 
there  a danger  of  Inventing  more  and  more  complex  microsteps  and 
repeating  the  evolutionary  cycle  that  led  to  the  IBM  System/360 
and  other  'range'  architectures?  The  return  in  space  that  can  be 
expected  from  more  complex  instructions  depends  on  finding 
frequently  repeated  diagrams  or  n-grams  that  can  be  suitably 
packaged.  They  are  more  likely  to  occur  in  arithmetic,  where 
'hardened'  floating  point  and  decimal  operation  can  be  expected, 
then  in  control  sequences.  It  would  not  be  surprising  to  see  the 
host  arithmetic  functions  develop  in  the  direction  of  current 
machine  codes  (with  type  interpretation  placed  on  descriptor  or 
tag  glelds),  but  the  many  nodes  of  data  access  appear  to  benefit 
very  little  from  complex  addressing  rules. 
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